This notebook is arranged in cells. Texts are usually written in the markdown cells, and here you can use html tags (make it bold, italic, colored, etc). You can double click on this cell to see the formatting.
The ellipsis (...) are provided where you are expected to write your solution but feel free to change the template (not over much) in case this style is not to your taste.
Hit "Shift-Enter" on a code cell to evaluate it. Double click a Markdown cell to edit.
import numpy as np
from scipy.integrate import quad
import sklearn as sk
from sklearn import datasets, linear_model
from sklearn.preprocessing import PolynomialFeatures
#For plotting
import matplotlib.pyplot as plt
%matplotlib inline
Mount your Google Drive on your runtime using an authorization code.
Note: When using the 'Mount Drive' button in the file browser, no authentication codes are necessary for notebooks that have only been edited by the current user.
#plot goodies
def canvas_ticks(obj):
'''This provides ticks in to a blanck canvas, for singular plots
use plt as the argumenet, for suplots, in for gridspec for expample
insert ax1 as argument'''
obj.minorticks_on()
obj.tick_params(labelsize=14)
obj.tick_params(labelbottom=True, labeltop=False, labelright=False, labelleft=True)
obj.tick_params(direction='in',which='minor', length=5, bottom=True, top=True, left=True, right=True)
obj.tick_params(direction='in',which='major', length=10, bottom=True, top=True, left=True, right=True)
plt.style.use('default')
plt.rcParams['mathtext.fontset'] = 'stix'
plt.rcParams['font.family'] = 'STIXGeneral'
plt.rcParams['font.size'] = 14
error_bar_settings = {
'fmt': 'o',
'ms':6,
# 'mfc': plot_color,
'ecolor': 'black',
'mec': 'black',
'capsize': 2.,
'mew': .5,
'elinewidth': .7,
# 'alpha': 0.85,
}
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
(Reference - NR 15.4) We fit a set of 50 data points $(x_i, y_i)$ to a polynomial $y(x) = a_0 + a_1x + a_2x^2 + a_3x^3$. (Note that this problem is linear in $a_i$ but nonlinear in $x_i$). The uncertainty $\sigma_i$ associated with each measurement $y_i$ is known, and we assume that the $x_i$'s are known exactly. To measure how well the model agrees with the data, we use the chi-square merit function:
$$ \chi^2 = \sum_{i=0}^{N-1} \big( \frac{y_i-\sum_{k=0}^{M-1}a_k x^k}{\sigma_i} \big)^2. $$
where N = 50 and M = 4. Here, $1, x, ... , x^3$ are the basis functions.
1. Plot data. (Hint - https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.errorbar.html)
Again, in this and all future assignments, you are expected to show error bars in all figures if the data include uncertainties. You will lose points if error bars are not shown.
# Load a given 2D data
data = np.loadtxt("/content/drive/My Drive/P188_288/P188_288_HW3/Problem1_data.dat")
x = data[:,0]
y = data[:,1]
sig_y = data[:,2]
# Make plot
plt.figure(figsize = (10,7), dpi = 200)
# Scatter plot
plt.errorbar(x,y,yerr = sig_y, **error_bar_settings,)
canvas_ticks(plt)
plt.xlabel("x")
plt.ylabel("y")
Text(0, 0.5, 'y')
We will pick as best parameters those that minimize $\chi^2$.
First, let $\bf A$ be a matrix whose $N \times M$ components are constructed from the $M$ basis functions evaluated at the $N$ abscissas $x_i$, and from the $N$ measurement errors $\sigma_i$, by the prescription
$$ A_{ij} = \frac{X_j(x_i)}{\sigma_i} $$
where $X_0(x) = 1,\ X_1(x) = x,\ X_2(x) = x^2,\ X_3(x) = x^3$. We call this matrix $\bf A$ the design matrix.
Also, define a vector $\bf b$ of length $N$ by
$$ b_i = \frac{y_i}{\sigma_i} $$
and denote the $M$ vector whose components are the parameters to be fitted ($a_0, a_1, a_2, a_3$) by $\bf a$.
2. Define the design matrix A. (Hint: Its dimension should be NxM = 50x4.) Also, define the vector b. Print the first row of A.
N = 50
M = 4
# Define A
A = np.zeros(shape=(N,M))
for i in range(N):
for j in range(M):
A[i,j] = x[i]**j / sig_y[i]
# Define b
b = y/sig_y
# Print the first row of A
A[0,:]
array([0.30368985, 0.60162612, 1.1918541 , 2.36112786])
Minimize $\chi^2$ by differentiating it with respect to all $M$ parameters $a_k$ vaishes. This condition yields the matrix equation
$$ \sum_{j=0}^{M-1} \alpha_{kj}a_j = \beta_k$$
where $\bf \boldsymbol \alpha = A^T \cdot A$ and $\bf \boldsymbol \beta = A^T \cdot b$ ($\boldsymbol \alpha$ is an $M \times M$ matrix, and $\boldsymbol \beta$ is a vector of length $M$). This is the normal equation of the least squares problem. In matrix form, the normal equations can be written as:
$$ \bf \boldsymbol \alpha \cdot a = \boldsymbol \beta. $$
This can be solved for the vector of parameters $\bf a$ by linear algebra numerical methods.
3. Define the matrix alpha and vector beta. Print both alpha and beta.
# Transpose of the matrix A
A_transpose = A.T
# alpha matrix
alpha = np.dot(A_transpose,A, )
# stuff changes with alpha
Alpha = alpha.copy()
# beta vector
beta = np.dot(A_transpose, b )
# Print alpha and beta
print("Alpha:\n",alpha)
print("Beta:\n",beta)
Alpha: [[7.57344292e+00 1.59581405e+01 1.20838371e+02 4.80208969e+02] [1.59581405e+01 1.20838371e+02 4.80208969e+02 3.20704253e+03] [1.20838371e+02 4.80208969e+02 3.20704253e+03 1.62887892e+04] [4.80208969e+02 3.20704253e+03 1.62887892e+04 1.05149671e+05]] Beta: [ 118.53904396 727.88040211 3581.30337095 22023.93157276]
4. We have $ \bf \boldsymbol \alpha \cdot a = \boldsymbol \beta. $ Solve for $\bf a$ using (1) "GaussianElimination_pivot" defined below (2) LU decomposition and forward subsitution and backsubstitution. Plot the best-fit line on top of the data.
Hint: You can use scipy.linalg.lu to do the LU decomposition. After you do "L, U = lu(A, permute_l=True)," print L and U matrices. Note that L is not a lower triangle matrix. Swap rows of L (and B) and make it a lower triangular matrix. And then, solve for y in Ly = B.
e.g. If your L matrix is the following:
[[ 0.01577114 0.10593754 0.41569921 1. ] [ 0.03323166 -0.04364428 1. 0. ] [ 0.25163705 1. 0. 0. ] [ 1. 0. 0. 0. ]],
you can change it to this:
[[ 1. 0. 0. 0. ] [ 0.25163705 1. 0. 0. ] [ 0.03323166 -0.04364428 1. 0. ] [ 0.01577114 0.10593754 0.41569921 1. ]]
Then, you should also change B from
[ 118.53904396 727.88040211 3581.30337095 22023.93157276]
to
[22023.93157276 3581.30337095 727.88040211 118.53904396].
def GaussianElimination_pivot(A, b):
N = len(b)
for m in range(N):
# Check if A[m,m] is the largest value from elements bellow and perform swapping
for i in range(m+1,N):
if A[m,m] < A[i,m]:
A[[m,i],:] = A[[i,m],:]
b[[m,i]] = b[[i,m]]
# Divide by the diagonal element
div = A[m,m]
A[m,:] /= div
b[m] /= div
# Now subtract from the lower rows
for i in range(m+1,N):
mult = A[i,m]
A[i,:] -= mult*A[m,:]
b[i] -= mult*b[m]
# Backsubstitution
x = np.empty(N,float)
for m in range(N-1,-1,-1):
x[m] = b[m]
for i in range(m+1,N):
x[m] -= A[m,i]*x[i]
return x
# Using the Gaussian elimination with partial pivoting
a = GaussianElimination_pivot(alpha, beta)
print('Using Gaussian Elimination:')
print('a0 =', a[0], ', a1 =', a[1], ', a2 =', a[2], ', a3 =', a[3])
Using Gaussian Elimination: a0 = -0.030816295372758873 , a1 = 2.667646082249882 , a2 = 0.31483927007853074 , a3 = 0.07945935335134478
# "lu" does LU decomposition with pivot. Reference - https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.linalg.lu_factor.html
from scipy.linalg import lu
def solve_lu_pivot(A, B):
# LU decomposition with pivot
L, U = lu(A, permute_l=True)
N = len(B)
# forward substitution: We have Ly = B. Solve for y
L_inv = np.linalg.inv(L)
y = np.dot(L_inv, B)
# backward substitution: We have y = Ux. Solve for x.
U_inv = np.linalg.inv(U)
x = np.dot(U_inv, y)
return x
aL = solve_lu_pivot(alpha, beta)
print('Using LU Decomposition:')
print('a0 =', aL[0], ', a1 =', aL[1], ', a2 =', aL[2], ', a3 =', aL[3])
Using LU Decomposition: a0 = -0.030816295372751767 , a1 = 2.667646082249882 , a2 = 0.31483927007853074 , a3 = 0.07945935335134478
x_model = np.linspace(min(x), max(x), 1000)
# Make plot
plt.figure(figsize = (10,7), dpi = 200)
# Scatter plot
plt.errorbar(x,y,yerr = sig_y, **error_bar_settings,)
poly_a = np.poly1d(a[::-1])
poly_aL = np.poly1d(aL[::-1])
plt.plot(x_model, poly_a(x_model), 'k',label = 'Gaussian Elimination')
plt.plot(x_model, poly_aL(x_model), 'r--', label = 'LU decomposition')
# plt.plot()
canvas_ticks(plt)
plt.xlabel("x")
plt.ylabel("y")
plt.legend(frameon = False)
plt.show()
The inverse matrix $\bf C = \boldsymbol \alpha^{-1}$ is called the covariance matrix, which is closely related to the probable uncertainties of the estimated parameters $\bf a$. To estimate these uncertainties, we compute the variance associated with the estimate $a_j$. Following NR p.790, we obtain:
$$ \sigma^2(a_j) = \sum_{k=0}^{M-1} \sum_{l=0}^{M-1} C_{jk} C_{jl} \alpha_{kl} = C_{jj} $$
5. Compute the error (standard deviation - square root of the variance) on the fitted parameters using the covariance matrix.
from scipy.linalg import inv
# Covariance matrix
cov = inv(Alpha)
sigma_a0 = np.sqrt(cov[0,0])
sigma_a1 = np.sqrt(cov[1,1])
sigma_a2 = np.sqrt(cov[2,2])
sigma_a3 = np.sqrt(cov[3,3])
print('Error: on a0 =', sigma_a0, ', on a1 =', sigma_a1, ', on a2 =', sigma_a2, ', on a3 =', sigma_a3)
Error: on a0 = 0.7139418927087803 , on a1 = 0.22390274023290208 , on a2 = 0.06344814294696766 , on a3 = 0.01199869685671923
Now, instead of using the normal equations, we use singular value decomposition (SVD) to find the solution of least squares. Please read Ch. 15 of NR for more details. Remember that we have the $N \times M$ design matrix $\bf A$ and the vector $\bf b$ of length $N$. We wish to mind $\bf a$ which minimizes $\chi^2 = |\bf A \cdot a - b|^2$.
Using SVD, we can decompose $\bf A$ as the product of an $N \times M$ column-orthogonal matrix $\bf U$, an $M \times M$ diagonal matrix $\bf S$ (with positive or zero elements - the "singular" values), and the transpose of an $M \times M$ orthogonal matrix $\bf V$. ($\bf A = USV^{T}$).
Let $\bf U_{(i)}$ and $\bf V_{(i)}$ denote the columns of $\bf U$ and $\bf V$ respectively (Note: We get $M$ number of vectors of length $M$.) $\bf S_{(i,i)}$ are the $i$th diagonal elements (singular values) of $\bf S$. Then, the solution of the above least squares problem can be written as:
$$ \bf a = \sum_{i=1}^M \big( \frac{U_{(i)} \cdot b}{S_{(i,i)}} \big) V_{(i)}. $$
The variance in the estimate of a parameter $a_j$ is given by:
$$ \sigma^2(a_j) = \sum_{i=1}^M \big( \frac{V_{ji}}{S_{ii}} \big)^2 $$
and the covariance:
$$ \mathrm{Cov}(a_j, a_k) = \sum_{i=1}^M \big( \frac{V_{ji}V_{ki}}{S_{ii}^2} \big). $$
6. Decompose the design matrix A using SVD. Estimate the parameter $a_i$'s and its variance.
# Reference - https://docs.scipy.org/doc/numpy-1.13.0/refberence/generated/numpy.linalg.svd.html
from scipy.linalg import svd
# Decompose A
# Note: S, in this case, is a vector of length M, which contains the singular values.
U, S, VT = svd(A, full_matrices=False)
V = VT.T
print(U[:,0].shape)
# Solve for a
a_ = []
for i in range(M):
a_i = np.dot(U[:,i], b)/S[i] * V[:,i] #[row,column]
a_.append(a_i)
a_from_SVD = np.sum(a_, axis = 0)
print('Using SVD:')
print('a0 =', a_from_SVD[0], ', a1 =', a_from_SVD[1], ', a2 =', a_from_SVD[2], ', a3 =', a_from_SVD[3])
(50,) Using SVD: a0 = -0.030816295372707803 , a1 = 2.6676460822498824 , a2 = 0.3148392700785285 , a3 = 0.07945935335134474
# Error on a
sigma = np.zeros(M)
for i in range(M):
sigma_i = (V[:,i]/S[i])**2
sigma += sigma_i
sigma_a_SVD = np.sqrt(sigma)
print('Error: on a0 =', sigma_a_SVD[0], ', on a1 =', sigma_a_SVD[1], ', on a2 =', sigma_a_SVD[2], ', on a3 =', sigma_a_SVD[3])
Error: on a0 = 0.7139418927087802 , on a1 = 0.22390274023290271 , on a2 = 0.06344814294696767 , on a3 = 0.011998696856719273
Suppose that you are only interested in the parameters $a_0$ and $a_1$. We can plot the 2-dimensional confidence region ellipse for these parameters by building the covariance matrix:
$$ \mathrm{C'} = \binom{\sigma({a_0})^2\ \ \ \ \ \ \mathrm{Cov}({a_0, a_1})}{\mathrm{Cov}({a_0, a_1}) \ \ \ \ \ \ \sigma({a_1})^2} $$
The lengths of the ellipse axes are the square root of the eigenvalues of the covariance matrix, and we can calculate the counter-clockwise rotation of the ellipse with the rotation angle:
$$ \theta = \frac{1}{2} \mathrm{arctan}\Big( \frac{2\cdot \mathrm{Cov}({a_0, a_1})}{\sigma({a_0})^2-\sigma({a_1})^2} \Big) = \mathrm{arctan}(\frac{\vec{v_1}(y)}{\vec{v_1}(x)}) $$
where $\vec{v_1}$ is the eigenvector with the largest eigenvalue. So we calculate the angle of the largest eigenvector towards the x-axis to obtain the orientation of the ellipse.
Then, we multiply the axis lengths by some factor depending on the confidence level we are interested in. For 68%, this scale factor is $\sqrt{\Delta \chi^2} \approx 1.52$. For 95%, it is $\approx 2.48$.
7. Compute the covariance between $a_0$ and $a_1$. Plot the 68% and 95% confidence region of the parameter $a_0$ and $a_1$. The skeleton code is already provided.
from matplotlib.patches import Ellipse
import matplotlib as mpl
from numpy.linalg import eigvals
cov = np.zeros(2)
for i in range(4):
cov_i = V[0,i] * V[1,i]/ S[i]**2
cov += cov_i
cov
array([-0.05496243, -0.05496243])
# Build the covariance matrix
CovM = np.array([[sigma_a_SVD[0]**2, cov[1]],
[cov[0], sigma_a_SVD[1]**2]])
CovM
array([[ 0.50971303, -0.05496243],
[-0.05496243, 0.05013244]])
# Plot the confidence region (https://stackoverflow.com/questions/32371996/python-matplotlib-how-to-plot-ellipse)
eigvec, eigval, u = np.linalg.svd(CovM)
# Semimajor axis (diameter)
semimaj = np.sqrt(eigval[0])*2.
# Semiminor axis (diameter)
semimin = np.sqrt(eigval[1])*2.
theta = np.arctan(eigvec[0][1]/eigvec[0][0])
# Plot 1-sig confidence region
ell = mpl.patches.Ellipse(xy=[a[0], a[1]], width=1.52*semimaj, height=1.52*semimin, angle = theta*180/np.pi, facecolor = 'dodgerblue', edgecolor = 'royalblue', label = '68% confidence')
# Plot 2-sig confidence region
ell2 = mpl.patches.Ellipse(xy=[a[0], a[1]], width=2.48*semimaj, height=2.48*semimin, angle = theta*180/np.pi, facecolor = 'skyblue', edgecolor = 'royalblue', label = '95% confidence')
fig, ax = plt.subplots(figsize=(7,7))
ax.add_patch(ell2)
ax.add_patch(ell)
# Set bounds for x,y axes
bounds = np.sqrt(CovM.diagonal())
plt.xlim(a[0]-4*bounds[0], a[0]+4*bounds[0])
plt.ylim(a[1]-4*bounds[1], a[1]+4*bounds[1])
plt.grid(True)
plt.xlabel('$a_0$')
plt.ylabel('$a_1$')
plt.legend()
plt.show()
In lecture, we discussed that we fit the existing data to obtain model parameters in data analysis, while in machine learning we use the model derived from the existing data to make prediction for new data.
Next, let us take the given data and do the polynomial regression.
First, split the sample into training data and the testing data. Keep 80% data as training data and uses the remaining 20% data for testing.
8. Often, the data can be ordered in a specific manner, hence shuffle the data prior to splitting it into training and testing samples. (Use https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.shuffle.html)
rand = np.arange(N)
np.random.shuffle(rand)
x, y, sig_y = x[rand], y[rand], sig_y[rand]
train_x, test_x = np.split(x, [int(len(x) * 0.8)])
train_y, test_y = np.split(y, [int(len(y) * 0.8)])
train_sigy, test_sigy = np.split(sig_y, [int(len(sig_y) * 0.8)])
In the case of polynomial regression, we need to generate polynomial features (http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html) for preprocessing. Note that we call each term in the polynomial as a "feature" in our model, and here we generate features' high-order (and interaction) terms. For example, suppose we set the degree of the polynomial to be 3. Then, the features of $X$ is transformed from $(X)$ to $(1, X, X^2, X^3)$. We can do this transform using PolynomialFeatures.fit_transform(train_x). But fit_transform() takes the numpy array of shape [n_samples, n_features]. So you need to re-define our training set as train_set_prep = train_x[:,np.newaxis] so that it has the shape [40,1].
9. Define three different polynomial models with degree of 1, 3, 10. (e.g. model = PolynomialFeatures(degree=...) ) Then, fit to data and transform it using "fit_transform"
# e.g.
model_deg1 = PolynomialFeatures(degree = 1)
X_model_deg_1 = model_deg1.fit_transform(train_x[:,np.newaxis])
model_deg3 = PolynomialFeatures(degree = 3)
X_model_deg_3 = model_deg3.fit_transform(train_x[:,np.newaxis])
model_deg10 = PolynomialFeatures(degree = 10)
X_model_deg_10 = model_deg10.fit_transform(train_x[:,np.newaxis])
Then, do the least squares linear regression. (http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.fit)
10. Do the linear regression for three different polynomial models defined in Part 9. Plot the fit on top of the training data (Label each curve).
#deg 1
LR_deg1 = linear_model.LinearRegression()
LR_deg1.fit(X_model_deg_1, train_y)
X_deg1_sample = np.linspace(-5, 7, 100)
X_deg1_sample_transform = model_deg1.fit_transform(X_deg1_sample[:,np.newaxis])
LR_deg1.predict(X_deg1_sample_transform)
Y_deg1_sample = LR_deg1.predict(X_deg1_sample_transform)
#deg 3
LR_deg3 = linear_model.LinearRegression()
LR_deg3.fit(X_model_deg_3, train_y)
X_deg3_sample = np.linspace(-5, 7, 100)
X_deg3_sample_transform = model_deg3.fit_transform(X_deg3_sample[:,np.newaxis])
LR_deg3.predict(X_deg3_sample_transform)
Y_deg3_sample = LR_deg3.predict(X_deg3_sample_transform)
#deg 10
LR_deg10 = linear_model.LinearRegression()
LR_deg10.fit(X_model_deg_10, train_y)
X_deg10_sample = np.linspace(-5, 7, 100)
X_deg10_sample_transform = model_deg10.fit_transform(X_deg10_sample[:,np.newaxis])
LR_deg10.predict(X_deg10_sample_transform)
Y_deg10_sample = LR_deg10.predict(X_deg10_sample_transform)
# Make plot
plt.figure(figsize = (10, 7))
plt.plot(X_deg1_sample, Y_deg1_sample, label = "Deg 1")
plt.plot(X_deg3_sample, Y_deg3_sample, label = "Deg 2")
plt.plot(X_deg10_sample, Y_deg10_sample, label = "Deg 3")
plt.xlabel('$x$')
plt.ylabel('$y$')
plt.xlim(-5, 7)
plt.ylim(-12, 65)
plt.legend(frameon = False)
canvas_ticks(plt)
plt.show()
11. Plot the fit on top of the test data (Label each curve).
# Make plot
plt.figure(figsize = (10, 7))
plt.errorbar(train_x, train_y , train_sigy, **error_bar_settings)
plt.plot(X_deg1_sample, Y_deg1_sample, label = "Deg 1")
plt.plot(X_deg3_sample, Y_deg3_sample, label = "Deg 3")
plt.plot(X_deg10_sample, Y_deg10_sample, label = "Deg 10")
plt.xlabel('$x$')
plt.ylabel('$y$')
plt.xlim(-5, 7)
plt.ylim(-12, 65)
plt.legend(frameon = False)
canvas_ticks(plt)
plt.show()
You can obtain the estimated linear coefficients using linearmodel.LinearRegression.coef (http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression)
12. Print the linear coefficients of three polynomial models you used. For the polynomial of degree 10, do you see that high-order coefficients are very small?
print(f"The coef for deg 1:\n {LR_deg1.coef_}")
print(f"The coef for deg 3:\n {LR_deg3.coef_}")
print(f"The coef for deg 10:\n {LR_deg10.coef_}")
The coef for deg 1: [0. 5.15206924] The coef for deg 3: [0. 2.71126276 0.34082605 0.07220783] The coef for deg 10: [ 0.00000000e+00 3.03479834e+00 -2.45664835e+00 7.87278718e-02 6.05911657e-01 -5.79006451e-02 -4.15401822e-02 7.00303668e-03 6.70493974e-04 -2.08349279e-04 1.17336787e-05]
The following analysis is based on https://arxiv.org/pdf/1208.4122.pdf.
"Principal Component Analysis (PCA) is a powerful
and widely used technique to analyze data
by forming a custom set of “principal component”
eigenvectors that are optimized to describe the
most data variance with the fewest number of
components. With the full set of eigenvectors the data
may be reproduced exactly, i.e., PCA is a transformation
which can lend insight by identifying
which variations in a complex dataset are most
significant and how they are correlated. Alternately,
since the eigenvectors are optimized and
sorted by their ability to describe variance in the
data, PCA may be used to simplify a complex
dataset into a few eigenvectors plus coefficients,
under the approximation that higher-order eigenvectors
are predominantly describing fine tuned
noise or otherwise less important features of the
data." (S. Bailey, arxiv: 1208.4122)
In this problem, we take the quasar (QSO) spectra from the Sloan Digital Sky Survey (SDSS) and apply PCA to them. Filtering for high $S/N$ in order to apply the standard PCA, we select 18 high-$S/N$ spectra of QSOs with redshift 2.0 < z < 2.1, trimmed to $1340 < \lambda < 1620\ \mathring{A}$.
# Load data
wavelength = np.loadtxt("/content/drive/My Drive/P188_288/P188_288_HW3/Problem2_wavelength.txt")
flux = np.loadtxt("/content/drive/My Drive/P188_288/P188_288_HW3/Problem2_QSOspectra.txt")
# Data dimension
print( np.shape(wavelength) )
print( np.shape(flux) )
(824,) (18, 824)
In the above cell, we load the following data: wavelength in Angstroms ("wavelength") and 2D array of spectra x fluxes ("flux").
We have 824 wavelength bins, so "flux" is 18 $\times$ 824 matrix, each row containing fluxes of different QSO spectra.
1. Plot any three QSO spectra flux as a function of wavelength. (In order to better see the features of QSO spectra, you may plot them with some offsets.)
# Make plot
plt.figure(figsize = (8, 4), dpi = 200)
# plt.errorbar(train_x, train_y , train_sigy, **error_bar_settings)
plt.xlabel('Wavelength $[ \AA] $')
plt.ylabel('$f_{\lambda}+$ constant')
for i in range(3):
plt.plot(wavelength, flux[i]+i)
plt.legend(frameon = False)
canvas_ticks(plt)
plt.show()
WARNING:matplotlib.legend:No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
"Flux" is the data matrix of order 18 $\times$ 824. Call this matrix $\bf X$.
We can construct the covariance matrix $\bf C$ using the mean-centered data matrix. First, calculate the mean of each column and subtracts this from the column. Let $\bf X_c$ denote the mean-centered data matrix.
$\bf X_c$ $ =
\begin{bmatrix}
x_{(1,1)} - \overline{x}_1 & x_{(1,2)} - \overline{x}_2 & \dots & x_{(1,824)} - \overline{x}_{824} \\
x_{(2,1)} - \overline{x}_1 & x_{(2,2)} - \overline{x}_2 & \dots & x_{(2,824)} - \overline{x}_{824} \\
\vdots & \vdots & \vdots & \vdots \\
x_{(18,1)} - \overline{x}_1 & x_{(18,2)} - \overline{x}_2 & \dots & x_{(18,824)} - \overline{x}_{824}
\end{bmatrix}$
where $x_{m,n}$ denote the flux of $m$th QSO in $n$th wavelength
bin, and $\overline{x}_k$ is the mean flux in $k$th wavelength bin.
Then, the covariance matrix is:
$\bf C$ $ = \frac{1}{N-1}$ $\bf X_c^T X_c.$ ($N$ is the number of QSOs.)
2. Find the covariance matrix C using the data matrix flux.
X_C = flux - np.mean(flux, axis=0)
C = 1/(len(flux)-1)*np.matmul(X_C.T, X_C)
C.shape
(824, 824)
3. Using numpy.linalg, find eigenvalues and eigenvectors of the covariance matrix. Order the eigenvalues from largest to smallest and then plot them as a function of the number of eigenvalues. (Remember that the eigenvector with the highest eigenvalue is the principle component of the data set.)
In this case, we find that our covariance matrix is rank-17 matrix, so we only select the first 17 highest eigenvalues and corresponding eigenvectors (other eigenvalues are close to zero).
NOTE: Here, by ranking the eigenvalues based on their magnitudes, you basically rank them in order of significance. You should show that the first few components are dominant, accounting for most of the variability in the data. So you can plot eigenvalues as a function of component number (1,2,3,...,17)
from numpy.linalg import eig
eigenvalues, eigenvectors = np.linalg.eig(C)
sorted_indices = np.argsort(np.abs(eigenvalues))[::-1]
eigenvalues, eigenvectors = eigenvalues[sorted_indices], eigenvectors[sorted_indices]
# Make plot
N_eig = np.arange(len(eigenvalues))
plt.figure(figsize = (3.5,3.5), dpi = 200)
plt.scatter(N_eig[:17], eigenvalues[:17])
plt.ylabel('Eigen Value')
plt.xlabel('Component N')
canvas_ticks(plt)
plt.show()
/usr/local/lib/python3.10/dist-packages/matplotlib/collections.py:192: ComplexWarning: Casting complex values to real discards the imaginary part offsets = np.asanyarray(offsets, float)
4. Plot the first three eigenvectors. These eigenvectors
represent the principal variations of the spectra with respect to that mean spectrum.
eigenvectors.shape
(824, 824)
plt.figure(figsize = (8,3.5), dpi = 200)
for i in range(3):
plt.plot(N_eig, eigenvectors[:,i], lw =1, label = f'$\mathbf{{v_{i}}}$')
# plt.ylabel('Eigen Vectors')
plt.xlabel('Component N')
canvas_ticks(plt)
plt.legend(frameon = False)
plt.show()
/usr/local/lib/python3.10/dist-packages/matplotlib/cbook/__init__.py:1335: ComplexWarning: Casting complex values to real discards the imaginary part return np.asarray(x, float)
The eigenvectors indicate the direction of the principal components, so we can re-orient the data onto the new zes by multiplying the original mean-centered data by the eigenvectors. We call the re-oriented data "PC scores." (Call the PC score matrix $\bf Z$) Suppose that we have $k$ eigenvectors. Construct the matrix of eigenvectors $\bf V = [v_1 v_2 ... v_k]$, with $\bf v_i$ the $i$th highest eigenvector. Then, we can get 18 $\times\ k$ PC score matrix by multiplying the 18 $\times$ 824 data matrix with the 824 $\times\ k$ eigenvector matrix:
$$ \bf Z = X_c V $$
Then, we can reconstruct the data by mapping it back to 824 dimensions with $\bf V^T$:
$$ \bf \hat{X} = \boldsymbol \mu + Z V^T $$
where $\boldsymbol \mu$ is the vector of mean QSO flux.
Now, comparing the original data with the reconstructed data, we can calculate the residuals. Let $\bf X_{(i)}, \hat{X}_{(i)}$ denote the rows of $\bf X, \hat{X}$ respectively. Remember that the data matrix has the dimension 18 $\times$ 824, so each row $\bf X_{(i)}$ corresponding the spectra of one particular QSO. (For example, if you wish to see the QSO spectra in row 7, you can plot $\bf X_{(7)}$ as a function of wavelength.). Then, we can simply calculate the residual as $\frac{1}{N} \sum_{i=1}^N \bf |\hat{X}_{(i)} - X_{(i)}|^2$ where $N$ is the total number of QSOs (NOTE: $\bf |\hat{X}_{(i)} - X_{(i)}|$ is the magnitude of the difference between two vectors $\bf \hat{X}_{(i)}$ and $\bf X_{(i)}$.)
5. First, start with only mean flux value $\boldsymbol \mu$ (in this case $\bf \hat{X} = \boldsymbol \mu, V = 0$) and calculate the residual. Then, do the reconstruction using the first two principal eigenvectors $\bf V = [v_1 v_2]$ and calculate the residual. Finally, let $\bf V = [v_1 v_2 ... v_6]$ (the first six principal eigenvectors) and compute the residual.
V = eigenvectors[:,:6]
Z = np.dot(X_C, V)
# X_C.shape
Z.shape, V.T.shape
Xhat = np.mean(flux, axis =0) +np.dot(Z, V.T)
residual = flux - Xhat
def QSO_residuals(num_eigen_vect):
V = eigenvectors[:,:num_eigen_vect]
Z = np.dot(X_C, V)
Xhat = np.mean(flux, axis =0)
if num_eigen_vect == 0:
Xhat = np.mean(flux, axis =0)
else:
Xhat = np.mean(flux, axis =0) + np.dot(Z, V.T)
residual = np.sum((flux - Xhat)**2)/ len(flux)
return np.abs(residual), Xhat
residuals_0eigen, model_0eigen = QSO_residuals(0)
residuals_2eigen, model_2eigen = QSO_residuals(2)
residuals_6eigen, model_6eigen = QSO_residuals(6)
for i in [0,2,6]:
print(f"The residual for {i} principle eigen values is {QSO_residuals(i)[0]} ")
The residual for 0 principle eigen values is 7.8382003427995715 The residual for 2 principle eigen values is 4.370432340894141 The residual for 6 principle eigen values is 3.4558940203861805
model_0eigen.shape
(824,)
6. For any two QSO spectra, plot the original and reconstructed spectra using the first six principal eigenvectors.
plt.figure(figsize = (10,5), dpi = 200)
# plt.scatter(wavelength,residual[5])
plt.plot(wavelength, flux[0], label = 'Original')
plt.plot(wavelength, model_6eigen[0],label = 'Reconstuction')
plt.ylabel('$f_\lambda$')
plt.xlabel('Wavelength $\AA$')
canvas_ticks(plt)
plt.legend(frameon = False)
plt.show()
plt.figure(figsize = (10,5), dpi = 200)
# plt.scatter(wavelength,residual[5])
plt.plot(wavelength, flux[1], label = 'Original')
plt.plot(wavelength, model_6eigen[1],label = 'Reconstuction')
plt.ylabel('$f_\lambda$')
plt.xlabel('Wavelength $\AA$')
canvas_ticks(plt)
plt.legend(frameon = False)
plt.show()
7. Plot the residual as a function of the number of included eigenvectors (1,2,3,...,17).
np.abs(QSO_residuals(0)[0])
7.8382003427995715
plt.figure(figsize = (3.5,3.5), dpi = 200)
for i in range(1,18):
plt.scatter(i, QSO_residuals(i)[0], ls = '-', color = 'k')
plt.ylabel('$f_\lambda$')
plt.xlabel('Eigen Value Components')
canvas_ticks(plt)
# plt.legend(frameon = False)
plt.show()
In this problem, we only have 18 QSO spectra, so the idea of using PCA may seem silly. We can also use SVD to find eigenvalues and eigenvectors. With SVD, we get $\bf X_c = USV^T$. Then, the covariance matrix is $\bf C$ $ = \frac{1}{N-1}$ $\bf X_c^T X_c$ $ = \frac{1}{N-1}$ $\bf VS^2V^T.$ Then, the eigenvalues are the squared singular values scaled by the factor $\frac{1}{N-1}$ and the eigenvectors are the columns of $\bf V$.
8. Find the eigenvalues applying SVD to the mean-centered data matrix $\bf X_c$.
from scipy.linalg import svd
U, S, VT = svd(X_C )
V = VT.T
# Print Eigenvalues
eigenvalues = S**2/(len(flux[:,0])-1)
eigenvalues
array([3.57183360e+00, 1.16921314e+00, 8.19491399e-01, 6.43886056e-01,
4.87138879e-01, 3.16043520e-01, 2.72246202e-01, 1.83227778e-01,
1.41340818e-01, 1.35417557e-01, 1.24349547e-01, 9.68536857e-02,
8.91735508e-02, 8.05492370e-02, 6.53953675e-02, 5.46083371e-02,
4.85022745e-02, 2.72391755e-30])
In Assignment 2, we used the UMAP module to reduce the MNIST dataset to 2 dimensions (from 784) for easy visualization and observed that different classes (10 digits - "0", "1", ..., "9") got separated nicely into clusters when the MNIST data are embedded into lower dimensions by UMAP.
In this exercise, instead of the UMAP module, we use the PCA method for dimensionality reduction.
As mentioned in Problem 2, PCA is a technique for reducing the number of dimensions in a dataset whilst retaining most information. It is using the correlation between some dimensions and tries to provide a minimum number of variables that keeps the maximum amount of variation or information about how the original data is distributed. It does not do this using guesswork but using hard mathematics and it uses something known as the eigenvalues and eigenvectors of the data-matrix. These eigenvectors of the covariance matrix have the property that they point along the major directions of variation in the data. These are the directions of maximum variation in a dataset. Here, we use the scikit-learn implementation of PCA: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
First, load the MNIST data:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784')
X = mnist.data
Y = mnist.target
X = X.to_numpy()
Y = Y.to_numpy()
/usr/local/lib/python3.10/dist-packages/sklearn/datasets/_openml.py:968: FutureWarning: The default value of `parser` will change from `'liac-arff'` to `'auto'` in 1.4. You can set `parser='auto'` to silence this warning. Therefore, an `ImportError` will be raised from 1.4 if the dataset is dense and pandas is not installed. Note that the pandas parser may return different data types. See the Notes Section in fetch_openml's API doc for details. warn(
"$X$" contains information about the given MNIST digits. We have a 28x28 pixel grid, so each image is a vector of length 784; we have 70,000 images (digits), so $X$ is a 70,000x784 matrix. "$Y$" is a label (0-9; the category to which each image belongs) vector of length 70,000.
1. Do the following:
(1) Randomly shuffle data (i.e. randomize the order)
(Note: The label $Y_1$ corresponds to a vector $X_{1j}$, and even after shuffling, $Y_1$ should still correspond to $X_{1j}$.)
(2) Select 1/3 of the data. (You are free to work with a larger set of the data, but it will take much longer time to train.)
(3) Split data into training and test samples using train_test_split (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html). Set train_size = 0.8. (80% of $X$ is our training samples.) Print the dimension of training and test samples. </i></span>
for i in range(len(Y)):
Y[i] = float(Y[i])
from sklearn.model_selection import train_test_split
from sklearn.utils import check_random_state
rand = np.arange(len(X))
rand = np.random.permutation(rand)
# Generate shuffled indices
X = X[rand][::3]
Y = Y[rand][::3]
x_train, x_test, y_train,y_test = train_test_split(X,Y, train_size=0.8)
Many machine learning algorithms are also not scale invariant, and hence we need to scale the data (different features to a uniform scale). All this comes under preprocessing the data. (http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing) PCA is a prime example of when such normalization is important; if the variables are not measured on the same scale, then each principal component can be dominated by a single variable.
In this exercise, the MNIST pixel values in images should also be scaled prior to providing the images as an input to PCA. There are three main types of pixel scaling techniques: normalization (scaling pixel to the range 0-1), centering (scale pixel values to have a zero-mean), and standardization (scale pixel values to have a zero-mean and unit-variance).
First, let us try normalization. Each pixel contains a greyscale value quantified by an integer between 0 and 255. To standardize the dataset, we normalize the "$X$" data in the interval [0, 1].
2. Normalize the X data (both training and test).
x_train, x_test = x_train/255, x_test/255
Next, using scikit-learn's PCA module, we can select the first two principal components from the original 784 dimensions.
from sklearn.decomposition import PCA
(1) Define the PCA model with the first 2 principal components:
pca = PCA(n_components=2)
(2) Using "fit_transform," fit the model with the training X data and apply the dimensionality reduction on it.
X_train_PCA = pca.fit_transform(training X data)
(3) With the same model, apply the dimensionality reduction on the test X data.
X_test_PCA = pca.transform(test X data)
3. This problem is similar to HW2-Q4-Part3. For both training and test samples, create a scatterplot of the first and second principal component and color each of the different types of digits with a different color. Label each axis (e.g. x-axis: 1st principal component, y-axis: 2nd principal component). How does it compare to the UMAP results?
pca = PCA(n_components=2)
X_train_PCA = pca.fit_transform(x_train)
X_test_PCA = pca.transform(x_test)
fig, ax = plt.subplots(1, 2, figsize=(10, 5), dpi = 200)
# Scatter plot for the training data with color bar
scatter_train = ax[0].scatter(X_train_PCA[:, 0], X_train_PCA[:, 1], c=y_train, cmap="tab10")
ax[0].set_title("Training Data")
plt.colorbar(scatter_train, ax=ax[0], pad = 0.01,)
canvas_ticks(ax[0])
# Scatter plot for the test data with color bar
scatter_test = ax[1].scatter(X_test_PCA[:, 0], X_test_PCA[:, 1], c=y_test, cmap="tab10")
ax[1].set_title("Test Data")
plt.colorbar(scatter_test, ax=ax[1],pad = 0.01,)
plt.tight_layout()
canvas_ticks(ax[1])
plt.show()
4. Select the first three principal components and make 3D scatterplot on the training data. (similar to HW2-Q4-Part5)
pca = PCA(n_components=3)
X_train_PCA = pca.fit_transform(x_train)
X_test_PCA = pca.transform(x_test)
from mpl_toolkits.mplot3d import Axes3D
a, b, c = X_train_PCA[:,0], X_train_PCA[:,1], X_train_PCA[:,2]
A, B, C = X_test_PCA[:,0], X_test_PCA[:,1], X_test_PCA[:,2]
# Create a 3D plot
fig = plt.figure(figsize = (8,5), dpi = 200)
ax = fig.add_subplot(111, projection='3d')
surf = ax.scatter(a, b, c, c=y_train, cmap="tab10")
ax.set_title("Training Data")
# Add a color bar
fig.colorbar(surf, ax=ax, pad = 0.1,shrink=0.8)
plt.tight_layout()
plt.show()
# Create a 3D plot
fig = plt.figure(figsize = (8,5), dpi = 200)
ax = fig.add_subplot(111, projection='3d')
surf = ax.scatter(A, B, C, c=y_test, cmap="tab10")
ax.set_title("Test Data")
# Add a color bar
fig.colorbar(surf, ax=ax, pad = 0.1,shrink=0.8)
plt.tight_layout()
plt.show()
From the graph we can see the two or three components definitely hold some information, especially for specific digits, but clearly not enough to set all of them apart. There are other techniques, such as UMAP module or t-SNE (t-Distributed Stochastic Neighbouring Entities), which can better reduce the dimensions for visualisation.
from sklearn.neighbors import KNeighborsClassifier as knn
Now, we will introduce K-nearest neighbors (KNN), one of the most widely used machine learning classification techniques. We use scikit-learn implementation of KNN: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html
Ideally, we should tune KNN hyperparameters by doing a grid search using k-fold cross validation, but in this exercise we simply use default parameters with n_neighbors = 6.
(1) Define the knn classifier
clf = knn(n_neighbors=6)
(2) Fit the model
clf.fit(training X data, training Y/target data)
(3) Get the classification accuracy on the test data
clf.score(test X data, test Y/target data)
5. Evaluate the classification accuracy on the test data using a KNN classifier.
clf = knn(n_neighbors=6)
clf.fit(x_train, y_train.astype(int))
clf_sore = clf.score(x_test, y_test.astype(int))
print(clf_sore)
0.9588600814227555
The above KNN classifier considers all 784 features for each image when making its decisions. What if you do not need that many? It is possible that a lot of those features do not really affect our predictions that much. Or worse, KNN could be considering feature anomalies that are unique to our training data, resulting in overfitting. One way to deal with this is by removing features that aren’t contributing much.
Now, suppose you take the first two principal components from PCA and fit your model using those two components.
pca = PCA(n_components=2)
X_train_PCA = pca.fit_transform(training X data)
X_test_PCA = pca.transform(test X data)
Now you can take X_train_PCA, along with training Y data, to fit the KNN model and evaluate the classification accuracy.
6. Evaluate the classification accuracy with different number of PCA componenets. Let N_PCA_component = [2, 10, 20, 30, 40, 50, 60, 100, 200, 300, 400, 700]. Plot classification accuracy vs. number of PCA components. How does it compare to the accuracy in Part 5? Draw a horizontal line for the accuracy with all 784 features.
N_PCA_component = [2, 10, 20, 30, 40, 50, 60, 100, 200, 300, 400, 700]
score = []
for i in N_PCA_component:
pca = PCA(n_components=i)
X_train_PCA = pca.fit_transform(x_train)
X_test_PCA = pca.transform(x_test)
clf = knn(n_neighbors=6)
clf.fit(X_train_PCA, y_train.astype(int))
s = clf.score(X_test_PCA, y_test.astype(int))
score.append(s)
score
[0.42082708377973, 0.918577244482537, 0.9614313263338333, 0.9697878722948361, 0.9700021427040926, 0.9682879794300407, 0.9689307906578102, 0.9674308977930148, 0.9637883008356546, 0.9610027855153204, 0.959931433469038, 0.9588600814227555]
plt.figure(figsize = (3.5,3.5), dpi = 200)
plt.plot(N_PCA_component, score, 'b.-', color = 'k')
plt.axhline(y = clf_sore)
plt.ylabel('Classification Accuracy')
plt.xlabel('N PCA Components')
canvas_ticks(plt)
# plt.legend(frameon = False)
plt.show()
<ipython-input-69-b5ed9f0cc651>:2: UserWarning: color is redundantly defined by the 'color' keyword argument and the fmt string "b.-" (-> color='b'). The keyword argument will take precedence. plt.plot(N_PCA_component, score, 'b.-', color = 'k')
7. Instead of the PCA method, fit the UMAP model with the training data and do unsupervised learning. Reduce data to 2 dimensions (embed to 2 dimensions) and train the KNN model on the embedded training data. Compared to Part 6, does it give you a higher classification accuracy even with n_component = 2?
!pip install umap-learn
import umap
Requirement already satisfied: umap-learn in /usr/local/lib/python3.10/dist-packages (0.5.4) Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from umap-learn) (1.23.5) Requirement already satisfied: scipy>=1.3.1 in /usr/local/lib/python3.10/dist-packages (from umap-learn) (1.11.2) Requirement already satisfied: scikit-learn>=0.22 in /usr/local/lib/python3.10/dist-packages (from umap-learn) (1.2.2) Requirement already satisfied: numba>=0.51.2 in /usr/local/lib/python3.10/dist-packages (from umap-learn) (0.56.4) Requirement already satisfied: pynndescent>=0.5 in /usr/local/lib/python3.10/dist-packages (from umap-learn) (0.5.10) Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from umap-learn) (4.66.1) Requirement already satisfied: tbb>=2019.0 in /usr/local/lib/python3.10/dist-packages (from umap-learn) (2021.10.0) Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba>=0.51.2->umap-learn) (0.39.1) Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from numba>=0.51.2->umap-learn) (67.7.2) Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.10/dist-packages (from pynndescent>=0.5->umap-learn) (1.3.2) Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.22->umap-learn) (3.2.0)
model = umap.UMAP(n_components=2)
embedding_train = model.fit_transform(x_train, y_train)
embedding_test = model.transform(x_test)
clf = knn(n_neighbors=6)
clf.fit(embedding_train, y_train.astype(int))
clf.score(embedding_test, y_test.astype(int))
0.947075208913649
Instead of pixel normalization, we can also try feature rescaling through standardization (rescaling the features such that they have the properties of a standard normal distribution with a mean of zero and a standard deviation of one). We can use sklearn.preprocessing.StandardScaler for this job.
from sklearn.preprocessing import StandardScaler
(1) Define the StandardScaler
sc = StandardScaler()
(2) Fit the training X data and then transform it.
X_train = sc.fit_transform(training X data)
(3) Perform standardization on the test X data.
X_test = sc.transform(test X data)
8. Re-load the MNIST data (Repeat Part 1) and try standardization on both training and test X data following the above steps. Evaluate the classification accuracy using a KNN classifier. How does it compare to Part 5?
sc = StandardScaler()
X_train = sc.fit_transform(x_train)
X_test = sc.transform(x_test)
clf = knn(n_neighbors=6)
clf.fit(X_train, y_train.astype(int))
clf.score(X_test, y_test.astype(int))
0.9200771373473323
We did slightly worse now than in part 5, where our accuracy was 95.9% vs now we are getting 92.0% accuracy
9. Again, take the data from Part 8 (standardized X data) and do unsupervised learning using the UMAP module. Reduce the data to 2 dimensions, and plot the embedding as a scatterplot (for the training data) and color by the target array. How does it compare to HW2-Q5f? Which pixel rescaling method do you think works better?
model = umap.UMAP(n_components=2)
embedding_train = model.fit_transform(X_train)
embedding_test = model.transform(X_test)
fig, ax = plt.subplots(1, 2, figsize=(10, 5), dpi = 200)
# Scatter plot for the training data with color bar
scatter_train = ax[0].scatter(embedding_train[:, 0], embedding_train[:, 1],s = 1, c=y_train, cmap="tab10")
ax[0].set_title("Training Data")
plt.colorbar(scatter_train, ax=ax[0], pad = 0.01,)
canvas_ticks(ax[0])
# Scatter plot for the test data with color bar
scatter_test = ax[1].scatter(embedding_test[:, 0], embedding_test[:, 1], s= 1, c=y_test, cmap="tab10")
ax[1].set_title("Test Data")
plt.colorbar(scatter_test, ax=ax[1],pad = 0.01,)
plt.tight_layout()
canvas_ticks(ax[1])
plt.show()
!jupyter nbconvert --to html "/content/drive/My Drive/P188_288/P188_288_HW/HW1_188.ipynb"